Usenet 1993 July

home *** CD-ROM | disk | FTP | other *** search

/ Usenet 1993 July / InfoMagic USENET CD-ROM July 1993.ISO / sources / unix / volume14 / flex / part03 < prev next >

Wrap

Internet Message Format | 1988-05-02 | 48.8 KB

Subject: v14i081: Flex, a lex replacement, Part03/05 Newsgroups: comp.sources.unix Sender: sources Approved: rsalz@uunet.UU.NET Submitted-by: Vern Paxson <vern@lbl-csam.arpa> Posting-number: Volume 14, Issue 81 Archive-name: flex/part03 #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of archive 3 (of 5)." # Contents: flex.1 flexdef.h nfa.c # Wrapped by rsalz@fig.bbn.com on Tue May 3 17:31:30 1988 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'flex.1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'flex.1'\" else echo shar: Extracting \"'flex.1'\" $15558 characters$ sed "s/^X//" >'flex.1' <<'END_OF_FILE' X.TH FLEX 1 "13 May 1987" X.SH NAME flex - fast lexical analyzer generator X.SH SYNOPSIS X.B flex X[ X.B -dfirstvFILT -c[efmF] -Sskeleton_file X] [ X.I filename X] X.SH DESCRIPTION X.I flex is a rewrite of X.I lex intended to right some of that tool's deficiencies: in particular, X.I flex generates lexical analyzers much faster, and the analyzers use smaller tables and run faster. X.SH OPTIONS In addition to lex's X.B -t flag, flex has the following options: X.TP X.B -d makes the generated scanner run in X.I debug mode. Whenever a pattern is recognized the scanner will write to X.I stderr a line of the form: X.nf X X --accepting rule #n X X.fi Rules are numbered sequentially with the first one being 1. X.TP X.B -f has the same effect as lex's -f flag (do not compress the scanner tables); the mnemonic changes from X.I fast compilation to (take your pick) X.I full table or X.I fast scanner. The actual compilation takes X.I longer, since flex is I/O bound writing out the big table. X.IP This option is equivalent to X.B -cf X(see below). X.TP X.B -i instructs flex to generate a X.I case-insensitive scanner. The case of letters given in the flex input patterns will be ignored, and the rules will be matched regardless of case. The matched text given in X.I yytext will have the preserved case (i.e., it will not be folded). X.TP X.B -r specifies that the scanner uses the X.B REJECT action. X.TP X.B -s causes the X.I default rule X(that unmatched scanner input is echoed to X.I stdout) to be suppressed. If the scanner encounters input that does not match any of its rules, it aborts with an error. This option is useful for finding holes in a scanner's rule set. X.TP X.B -v has the same meaning as for lex (print to X.I stderr a summary of statistics of the generated scanner). Many more statistics are printed, though, and the summary spans several lines. Most of the statistics are meaningless to the casual flex user. X.TP X.B -F specifies that the X.ul fast scanner table representation should be used. This representation is about as fast as the full table representation X.ul X(-f), and for some sets of patterns will be considerably smaller (and for others, larger). In general, if the pattern set contains both "keywords" and a catch-all, "identifier" rule, such as in the set: X.nf X X "case" return ( TOK_CASE ); X "switch" return ( TOK_SWITCH ); X ... X "default" return ( TOK_DEFAULT ); X [a-z]+ return ( TOK_ID ); X X.fi then you're better off using the full table representation. If only the "identifier" rule is present and you then use a hash table or some such to detect the keywords, you're better off using X.ul X-F. X.IP This option is equivalent to X.B -cF X(see below). X.TP X.B -I instructs flex to generate an X.I interactive scanner. Normally, scanners generated by flex always look ahead one character before deciding that a rule has been matched. At the possible cost of some scanning overhead (it's not clear that more overhead is involved), flex will generate a scanner which only looks ahead when needed. Such scanners are called X.I interactive because if you want to write a scanner for an interactive system such as a command shell, you will probably want the user's input to be terminated with a newline, and without X.B -I the user will have to type a character in addition to the newline in order to have the newline recognized. This leads to dreadful interactive performance. X.IP If all this seems to confusing, here's the general rule: if a human will be typing in input to your scanner, use X.B -I, otherwise don't; if you don't care about how fast your scanners run and don't want to make any assumptions about the input to your scanner, always use X.B -I. X.IP Note, X.B -I cannot be used in conjunction with X.I full or X.I fast tables, i.e., the X.B -f, -F, -cf, or X.B -cF flags. X.TP X.B -L instructs flex to not generate X.B #line directives (see below). X.TP X.B -T makes flex run in X.I trace mode. It will generate a lot of messages to standard out concerning the form of the input and the resultant non-deterministic and deterministic finite automatons. This option is mostly for use in maintaining flex. X.TP X.B -c[efmF] controls the degree of table compression. X.B -ce directs flex to construct X.I equivalence classes, i.e., sets of characters which have identical lexical properties (for example, if the only appearance of digits in the flex input is in the character class X"[0-9]" then the digits '0', '1', ..., '9' will all be put in the same equivalence class). X.B -cf specifies that the X.I full scanner tables should be generated - flex should not compress the tables by taking advantages of similar transition functions for different states. X.B -cF specifies that the alternate fast scanner representation (described above under the X.B -F flag) should be used. X.B -cm directs flex to construct X.I meta-equivalence classes, which are sets of equivalence classes (or characters, if equivalence classes are not being used) that are commonly used together. A lone X.B -c specifies that the scanner tables should be compressed but neither equivalence classes nor meta-equivalence classes should be used. X.IP The options X.B -cf or X.B -cF and X.B -cm do not make sense together - there is no opportunity for meta-equivalence classes if the table is not being compressed. Otherwise the options may be freely mixed. X.IP The default setting is X.B -cem which specifies that flex should generate equivalence classes and meta-equivalence classes. This setting provides the highest degree of table compression. You can trade off faster-executing scanners at the cost of larger tables with the following generally being true: X.nf X X slowest smallest X -cem X -ce X -cm X -c X -c{f,F}e X -c{f,F} X fastest largest X X.fi X.TP X.B -Sskeleton_file overrides the default skeleton file from which flex constructs its scanners. You'll never need this option unless you are doing flex maintenance or development. X.SH INCOMPATIBILITIES WITH LEX X.I flex is fully compatible with X.I lex with the following exceptions: X.IP - There is no run-time library to link with. You needn't specify X.I -ll when linking, and you must supply a main program. (Hacker's note: since the lex library contains a main() which simply calls yylex(), you actually X.I can be lazy and not supply your own main program and link with X.I -ll.) X.IP - lex's X.B %r X(Ratfor scanners) and X.B %t X(translation table) options are not supported. X.IP - The do-nothing X.ul X-n flag is not supported. X.IP - When definitions are expanded, flex encloses them in parentheses. With lex, the following X.nf X X NAME [A-Z][A-Z0-9]* X %% X foo{NAME}? printf( "Found it\\n" ); X %% X X.fi will not match the string "foo" because when the macro is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence is such that the '?' is associated with X"[A-Z0-9]*". With flex, the rule will be expanded to X"foo([A-z][A-Z0-9]*)?" and so the string "foo" will match. X.IP - X.B yymore() is not supported. X.IP - The undocumented lex-scanner internal variable X.B yylineno is not supported. X.IP - If your input uses X.B REJECT, you must run flex with the X.B -r flag. If you leave out the flag, the scanner will abort at run-time with a message that the scanner was compiled without the flag being specified. X.IP - The X.B input() routine is not redefinable, though may be called to read characters following whatever has been matched by a rule. If X.B input() encounters and end-of-file the normal X.B yywrap() processing is done. A ``real'' end-of-file is returned as X.I EOF. X.IP Input can be controlled by redefining the X.B YY_INPUT macro. YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its action is to place up to max_size characters in the character buffer "buf" and return in the integer variable "result" either the number of characters read or the constant YY_NULL (0 on Unix systems) systems) to indicate EOF. The default YY_INPUT reads from the file-pointer "yyin" (which is by default X.I stdin), so if you just want to change the input file, you needn't redefine YY_INPUT - just point yyin at the input file. X.IP A sample redefinition of YY_INPUT (in the first section of the input file): X.nf X X %{ X #undef YY_INPUT X #define YY_INPUT(buf,result,max_size) \\ X result = (buf[0] = getchar()) == EOF ? YY_NULL : 1; X %} X X.fi You also can add in things like counting keeping track of the input line number this way; but don't expect your scanner to go very fast. X.IP - X.B output() is not supported. Output from the ECHO macro is done to the file-pointer X"yyout" (default X.I stdout). X.IP - Trailing context is restricted to patterns which have either a fixed-sized leading part or a fixed-sized trailing part. XFor example, "a*/b" and "a/b*" are okay, but not "a*/b*". This restriction is due to a bug in the trailing context algorithm given in X.I Principles of Compiler Design X(and X.I Compilers - Principles, Techniques, and Tools) which can result in mismatches. Try the following lex program X.nf X X %% X x+/xy printf( "I found \\"%s\\"\\n", yytext ); X X.fi on the input "xxy". (If anyone knows of a fast algorithm for finding the beginning of trailing context for an arbitrary pair of regular expressions, please let me know!) If you must have arbitrary trailing context, you can use X.B yyless() to effect it. X.IP - flex reads only one input file, while lex's input is made up of the concatenation of its input files. X.SH ENHANCEMENTS X.IP - X.I Exclusive start-conditions can be declared by using X.B %x instead of X.B %s. These start-conditions have the property that when they are active, X.I no other rules are active. Thus a set of rules governed by the same exclusive start condition describe a scanner which is independent of any of the other rules in the flex input. This feature makes it easy to specify "mini-scanners" which scan portions of the input that are syntactically different from the rest (e.g., comments). X.IP - flex dynamically resizes its internal tables, so directives like "%a 3000" are not needed when specifying large scanners. X.IP - The scanning routine generated by flex is declared using the macro X.B YY_DECL. By redefining this macro you can change the routine's name and its calling sequence. For example, you could use: X.nf X X #undef YY_DECL X #define YY_DECL float lexscan( a, b ) float a, b; X X.fi to give it the name X.I lexscan, returning a float, and taking two floats as arguments. X.IP - flex generates X.B #line directives mapping lines in the output to their origin in the input file. X.IP - You can put multiple actions on the same line, separated with semi-colons. With lex, the following X.nf X X foo handle_foo(); return 1; X X.fi is truncated to X.nf X X foo handle_foo(); X X.fi flex does not truncate the action. Actions that are not enclosed in braces are terminated at the end of the line. X.IP - Actions can be begun with X.B %{ and terminated with X.B %}. In this case, flex does not count braces to figure out where the action ends - actions are terminated by the closing X.B %}. This feature is useful when the enclosed action has extraneous braces in it (usually in comments or inside inactive #ifdef's) that throw off the brace-count. X.IP - All of the scanner actions (e.g., X.B ECHO, yywrap ...) except the X.B unput() and X.B input() routines, are written as macros, so they can be redefined if necessary without requiring a separate library to link to. X.SH FILES X.TP X.I flex.skel skeleton scanner X.TP X.I flex.fastskel skeleton scanner for -f and -F X.TP X.I flexskelcom.h common definitions for skeleton files X.TP X.I flexskeldef.h definitions for compressed skeleton file X.TP X.I fastskeldef.h definitions for -f, -F skeleton file X.SH "SEE ALSO" X.LP lex(1) X.LP M. E. Lesk and E. Schmidt, X.I LEX - Lexical Analyzer Generator X.SH AUTHOR Vern Paxson, with the help of many ideas and much inspiration from Van Jacobson. Original version by Jef Poskanzer. Fast table representation is a partial implementation of a design done by Van Jacobson. The implementation was done by Kevin Gong and Vern Paxson. X.LP Thanks to the many flex beta-testers, especially Casey Leedom, Nick Christopher, Chris Faylor, Eric Goldman, Craig Leres, Mohamed el Lozy, Esmond Pitt, Jef Poskanzer, and Dave Tallman. Thanks to John Gilmore, Bob Mulcahy, Rich Salz, and Richard Stallman for help with various distribution headaches. X.LP Send comments to: X.nf X X Vern Paxson X Real Time Systems X Bldg. 46A X Lawrence Berkeley Laboratory X 1 Cyclotron Rd. X Berkeley, CA 94720 X X (415) 486-6411 X X vern@lbl-{csam,rtsg}.arpa X ucbvax!lbl-csam.arpa!vern X X.fi X.SH DIAGNOSTICS X.LP X.I flex scanner jammed - a scanner compiled with X.B -s has encountered an input string which wasn't matched by any of its rules. X.LP X.I flex input buffer overflowed - a scanner rule matched a string long enough to overflow the scanner's internal input buffer (as large as X.B BUFSIZ in "/usr/include/stdio.h"). You can edit X.I flexskelcom.h and increase X.B YY_BUF_SIZE and X.B YY_MAX_LINE to increase this limit. X.LP X.I REJECT used and scanner was X.I not generated using -r - just like it sounds. Your scanner uses X.B REJECT. You must run flex on the scanner description using the X.B -r flag. X.LP X.I old-style lex command ignored - the flex input contains a lex command (e.g., "%n 1000") which is being ignored. X.SH BUGS X.LP Use of unput() or input() trashes the current yytext and yyleng. X.LP Use of unput() to push back more text than was matched can result in the pushed-back text matching a beginning-of-line ('^') rule even though it didn't come at the beginning of the line. X.LP Nulls are not allowed in flex inputs or in the inputs to scanners generated by flex. Their presence generates fatal errors. X.LP Do not mix trailing context with the '|' operator used to specify that multiple rules use the same action. That is, avoid constructs like: X.nf X X foo/bar | X bletch | X bugprone { ... } X X.fi They can result in subtle mismatches. This is actually not a problem if there is only one rule using trailing context and it is the first in the list (so the above example will actually work okay). The problem is due to fall-through in the action switch statement, causing non-trailing-context rules to execute the trailing-context code of their fellow rules. This should be fixed, as it's a nasty bug and not obvious. The proper fix is for flex to spit out a FLEX_TRAILING_CONTEXT_USED #define and then have the backup logic in a separate table which is consulted for each rule-match, rather than as part of the rule action. The place to do the tweaking is in add_accept() - any kind soul want to be a hero? X.LP The pattern: X.nf X X x{3} X X.fi is considered to be variable-length for the purposes of trailing context, even though it has a clear fixed length. X.LP Due to both buffering of input and read-ahead, you cannot intermix calls to, for example, X.B getchar() with flex rules and expect it to work. Call X.B input() instead. X.LP The total table entries listed by the X.B -v flag excludes the number of table entries needed to determine what rule has been matched. The number of entries is equal to the number of DFA states if the scanner was not compiled with X.B -r, and greater than the number of states if it was. X.LP The scanner run-time speeds have not been optimized as much as they deserve. Van Jacobson's work shows that the can go quite a bit faster still. END_OF_FILE if test 15558 -ne `wc -c <'flex.1'`; then echo shar: \"'flex.1'\" unpacked with wrong size! fi # end of 'flex.1' fi if test -f 'flexdef.h' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'flexdef.h'\" else echo shar: Extracting \"'flexdef.h'\" $17327 characters$ sed "s/^X//" >'flexdef.h' <<'END_OF_FILE' X/* X * Definitions for flex. X * X * modification history X * -------------------- X * 02b kg, vp 30sep87 .added definitions for fast scanner; misc. cleanup X * 02a vp 27jun86 .translated into C/FTL X */ X X/* X * Copyright (c) 1987, the University of California X * X * The United States Government has rights in this work pursuant to X * contract no. DE-AC03-76SF00098 between the United States Department of X * Energy and the University of California. X * X * This program may be redistributed. Enhancements and derivative works X * may be created provided the new works, if made available to the general X * public, are made available for use by anyone. X */ X X#include <stdio.h> X X#ifdef SV X#include <string.h> X#define bzero(s, n) memset((char *)(s), '\000', (unsigned)(n)) X#else X#include <strings.h> X#endif X char *sprintf(); /* keep lint happy */ X X X/* maximum line length we'll have to deal with */ X#define MAXLINE BUFSIZ X X/* maximum size of file name */ X#define FILENAMESIZE 1024 X X#define min(x,y) (x < y ? x : y) X#define max(x,y) (x > y ? x : y) X X#define true 1 X#define false 0 X X X#ifndef DEFAULT_SKELETON_FILE X#define DEFAULT_SKELETON_FILE "flex.skel" X#endif X X#ifndef FAST_SKELETON_FILE X#define FAST_SKELETON_FILE "flex.fastskel" X#endif X X/* special nxt[] action number for the "at the end of the input buffer" state */ X/* note: -1 is already taken by YY_NEW_FILE */ X#define END_OF_BUFFER_ACTION -3 X/* action number for default action for fast scanners */ X#define DEFAULT_ACTION -2 X X/* special chk[] values marking the slots taking by end-of-buffer and action X * numbers X */ X#define EOB_POSITION -1 X#define ACTION_POSITION -2 X X/* number of data items per line for -f output */ X#define NUMDATAITEMS 10 X X/* number of lines of data in -f output before inserting a blank line for X * readability. X */ X#define NUMDATALINES 10 X X/* transition_struct_out() definitions */ X#define TRANS_STRUCT_PRINT_LENGTH 15 X X/* returns true if an nfa state has an epsilon out-transition slot X * that can be used. This definition is currently not used. X */ X#define FREE_EPSILON(state) \ X (transchar[state] == SYM_EPSILON && \ X trans2[state] == NO_TRANSITION && \ X finalst[state] != state) X X/* returns true if an nfa state has an epsilon out-transition character X * and both slots are free X */ X#define SUPER_FREE_EPSILON(state) \ X (transchar[state] == SYM_EPSILON && \ X trans1[state] == NO_TRANSITION) \ X X/* maximum number of NFA states that can comprise a DFA state. It's real X * big because if there's a lot of rules, the initial state will have a X * huge epsilon closure. X */ X#define INITIAL_MAX_DFA_SIZE 750 X#define MAX_DFA_SIZE_INCREMENT 750 X X/* array names to be used in generated machine. They're short because X * we write out one data statement (which names the array) for each element X * in the array. X */ X X#define ALIST 'l' /* points to list of rules accepted for a state */ X#define ACCEPT 'a' /* list of rules accepted for a state */ X#define ECARRAY 'e' /* maps input characters to equivalence classes */ X#define MATCHARRAY 'm' /* maps equivalence classes to meta-equivalence classes */ X#define BASEARRAY 'b' /* "base" array */ X#define DEFARRAY 'd' /* "default" array */ X#define NEXTARRAY 'n' /* "next" array */ X#define CHECKARRAY 'c' /* "check" array */ X X/* NIL must be 0. If not, its special meaning when making equivalence classes X * (it marks the representative of a given e.c.) will be unidentifiable X */ X#define NIL 0 X X#define JAM -1 /* to mark a missing DFA transition */ X#define NO_TRANSITION NIL X#define UNIQUE -1 /* marks a symbol as an e.c. representative */ X#define INFINITY -1 /* for x{5,} constructions */ X X/* size of input alphabet - should be size of ASCII set */ X#define CSIZE 127 X X#define INITIAL_MAXCCLS 100 /* max number of unique character classes */ X#define MAXCCLS_INCREMENT 100 X X/* size of table holding members of character classes */ X#define INITIAL_MAX_CCL_TBL_SIZE 500 X#define MAX_CCL_TBL_SIZE_INCREMENT 250 X X#define INITIAL_MNS 2000 /* default maximum number of nfa states */ X#define MNS_INCREMENT 1000 /* amount to bump above by if it's not enough */ X X#define INITIAL_MAX_DFAS 1000 /* default maximum number of dfa states */ X#define MAX_DFAS_INCREMENT 1000 X X#define JAMSTATE -32766 /* marks a reference to the state that always jams */ X X/* enough so that if it's subtracted from an NFA state number, the result X * is guaranteed to be negative X */ X#define MARKER_DIFFERENCE 32000 X#define MAXIMUM_MNS 31999 X X/* maximum number of nxt/chk pairs for non-templates */ X#define INITIAL_MAX_XPAIRS 2000 X#define MAX_XPAIRS_INCREMENT 2000 X X/* maximum number of nxt/chk pairs needed for templates */ X#define INITIAL_MAX_TEMPLATE_XPAIRS 2500 X#define MAX_TEMPLATE_XPAIRS_INCREMENT 2500 X X#define SYM_EPSILON 0 /* to mark transitions on the symbol epsilon */ X X#define INITIAL_MAX_SCS 40 /* maximum number of start conditions */ X#define MAX_SCS_INCREMENT 40 /* amount to bump by if it's not enough */ X X#define ONE_STACK_SIZE 500 /* stack of states with only one out-transition */ X#define SAME_TRANS -1 /* transition is the same as "default" entry for state */ X X/* the following percentages are used to tune table compression: X X * the percentage the number of out-transitions a state must be of the X * number of equivalence classes in order to be considered for table X * compaction by using protos X */ X#define PROTO_SIZE_PERCENTAGE 15 X X/* the percentage the number of homogeneous out-transitions of a state X * must be of the number of total out-transitions of the state in order X * that the state's transition table is first compared with a potential X * template of the most common out-transition instead of with the first X * proto in the proto queue X */ X#define CHECK_COM_PERCENTAGE 50 X X/* the percentage the number of differences between a state's transition X * table and the proto it was first compared with must be of the total X * number of out-transitions of the state in order to keep the first X * proto as a good match and not search any further X */ X#define FIRST_MATCH_DIFF_PERCENTAGE 10 X X/* the percentage the number of differences between a state's transition X * table and the most similar proto must be of the state's total number X * of out-transitions to use the proto as an acceptable close match X */ X#define ACCEPTABLE_DIFF_PERCENTAGE 50 X X/* the percentage the number of homogeneous out-transitions of a state X * must be of the number of total out-transitions of the state in order X * to consider making a template from the state X */ X#define TEMPLATE_SAME_PERCENTAGE 60 X X/* the percentage the number of differences between a state's transition X * table and the most similar proto must be of the state's total number X * of out-transitions to create a new proto from the state X */ X#define NEW_PROTO_DIFF_PERCENTAGE 20 X X/* the percentage the total number of out-transitions of a state must be X * of the number of equivalence classes in order to consider trying to X * fit the transition table into "holes" inside the nxt/chk table. X */ X#define INTERIOR_FIT_PERCENTAGE 15 X X/* size of region set aside to cache the complete transition table of X * protos on the proto queue to enable quick comparisons X */ X#define PROT_SAVE_SIZE 2000 X X#define MSP 50 /* maximum number of saved protos (protos on the proto queue) */ X X/* maximum number of out-transitions a state can have that we'll rummage X * around through the interior of the internal fast table looking for a X * spot for it X */ X#define MAX_XTIONS_FOR_FULL_INTERIOR_FIT 4 X X/* number that, if used to subscript an array, has a good chance of producing X * an error; should be small enough to fit into a short X */ X#define BAD_SUBSCRIPT -32767 X X/* absolute value of largest number that can be stored in a short, with a X * bit of slop thrown in for general paranoia. X */ X#define MAX_SHORT 32766 X X X/* Declarations for global variables. */ X X/* variables for symbol tables: X * sctbl - start-condition symbol table X * ndtbl - name-definition symbol table X * ccltab - character class text symbol table X */ X struct hash_entry X { X struct hash_entry *prev, *next; X char *name; X char *str_val; X int int_val; X } ; X typedef struct hash_entry *hash_table[]; X X#define NAME_TABLE_HASH_SIZE 101 X#define START_COND_HASH_SIZE 101 X#define CCL_HASH_SIZE 101 X extern struct hash_entry *ndtbl[NAME_TABLE_HASH_SIZE]; extern struct hash_entry *sctbl[START_COND_HASH_SIZE]; extern struct hash_entry *ccltab[CCL_HASH_SIZE]; X X X/* variables for flags: X * printstats - if true (-v), dump statistics X * syntaxerror - true if a syntax error has been found X * eofseen - true if we've seen an eof in the input file X * ddebug - if true (-d), make a "debug" scanner X * trace - if true (-T), trace processing X * spprdflt - if true (-s), suppress the default rule X * interactive - if true (-I), generate an interactive scanner X * caseins - if true (-i), generate a case-insensitive scanner X * useecs - if true (-ce flag), use equivalence classes X * fulltbl - if true (-cf flag), don't compress the DFA state table X * usemecs - if true (-cm flag), use meta-equivalence classes X * reject - if true (-r flag), generate tables for REJECT macro X * fullspd - if true (-F flag), use Jacobson method of table representation X * gen_line_dirs - if true (i.e., no -L flag), generate #line directives X */ X extern int printstats, syntaxerror, eofseen, ddebug, trace, spprdflt; extern int interactive, caseins, useecs, fulltbl, usemecs, reject; extern int fullspd, gen_line_dirs; X X X/* variables used in the flex input routines: X * datapos - characters on current output line X * dataline - number of contiguous lines of data in current data X * statement. Used to generate readable -f output X * skelfile - fd of the skeleton file X * yyin - input file X * temp_action_file - temporary file to hold actions X * action_file_name - name of the temporary file X * infilename - name of input file X * linenum - current input line number X */ X extern int datapos, dataline, linenum; extern FILE *skelfile, *yyin, *temp_action_file; extern char *infilename; extern char *action_file_name; X X X/* variables for stack of states having only one out-transition: X * onestate - state number X * onesym - transition symbol X * onenext - target state X * onedef - default base entry X * onesp - stack pointer X */ X extern int onestate[ONE_STACK_SIZE], onesym[ONE_STACK_SIZE]; extern int onenext[ONE_STACK_SIZE], onedef[ONE_STACK_SIZE], onesp; X X X/* variables for nfa machine data: X * current_mns - current maximum on number of NFA states X * accnum - number of the last accepting state X * firstst - physically the first state of a fragment X * lastst - last physical state of fragment X * finalst - last logical state of fragment X * transchar - transition character X * trans1 - transition state X * trans2 - 2nd transition state for epsilons X * accptnum - accepting number X * lastnfa - last nfa state number created X */ X extern int current_mns; extern int accnum, *firstst, *lastst, *finalst, *transchar; extern int *trans1, *trans2, *accptnum, lastnfa; X X X/* variables for protos: X * numtemps - number of templates created X * numprots - number of protos created X * protprev - backlink to a more-recently used proto X * protnext - forward link to a less-recently used proto X * prottbl - base/def table entry for proto X * protcomst - common state of proto X * firstprot - number of the most recently used proto X * lastprot - number of the least recently used proto X * protsave contains the entire state array for protos X */ X extern int numtemps, numprots, protprev[MSP], protnext[MSP], prottbl[MSP]; extern int protcomst[MSP], firstprot, lastprot, protsave[PROT_SAVE_SIZE]; X X X/* variables for managing equivalence classes: X * numecs - number of equivalence classes X * nextecm - forward link of Equivalence Class members X * ecgroup - class number or backward link of EC members X * nummecs - number of meta-equivalence classes (used to compress X * templates) X * tecfwd - forward link of meta-equivalence classes members X * tecbck - backward link of MEC's X */ X extern int numecs, nextecm[CSIZE + 1], ecgroup[CSIZE + 1], nummecs; extern int tecfwd[CSIZE + 1], tecbck[CSIZE + 1]; X X X/* variables for start conditions: X * lastsc - last start condition created X * current_max_scs - current limit on number of start conditions X * scset - set of rules active in start condition X * scbol - set of rules active only at the beginning of line in a s.c. X * scxclu - true if start condition is exclusive X * actvsc - stack of active start conditions for the current rule X */ X extern int lastsc, current_max_scs, *scset, *scbol, *scxclu, *actvsc; X X X/* variables for dfa machine data: X * current_max_dfa_size - current maximum number of NFA states in DFA X * current_max_xpairs - current maximum number of non-template xtion pairs X * current_max_template_xpairs - current maximum number of template pairs X * current_max_dfas - current maximum number DFA states X * lastdfa - last dfa state number created X * nxt - state to enter upon reading character X * chk - check value to see if "nxt" applies X * tnxt - internal nxt table for templates X * base - offset into "nxt" for given state X * def - where to go if "chk" disallows "nxt" entry X * tblend - last "nxt/chk" table entry being used X * firstfree - first empty entry in "nxt/chk" table X * dss - nfa state set for each dfa X * dfasiz - size of nfa state set for each dfa X * dfaacc - accepting set for each dfa state (or accepting number, if X * -r is not given) X * accsiz - size of accepting set for each dfa state X * dhash - dfa state hash value X * todo - queue of DFAs still to be processed X * todo_head - head of todo queue X * todo_next - next available entry on todo queue X * numas - number of DFA accepting states created; note that this X * is not necessarily the same value as accnum, which is the analogous X * value for the NFA X * numsnpairs - number of state/nextstate transition pairs X * jambase - position in base/def where the default jam table starts X * jamstate - state number corresponding to "jam" state X * end_of_buffer_state - end-of-buffer dfa state number X */ X extern int current_max_dfa_size, current_max_xpairs; extern int current_max_template_xpairs, current_max_dfas; extern int lastdfa, lasttemp, *nxt, *chk, *tnxt; extern int *base, *def, tblend, firstfree, **dss, *dfasiz; extern union dfaacc_union X { X int *dfaacc_set; X int dfaacc_state; X } *dfaacc; extern int *accsiz, *dhash, *todo, todo_head, todo_next, numas; extern int numsnpairs, jambase, jamstate; extern int end_of_buffer_state; X X/* variables for ccl information: X * lastccl - ccl index of the last created ccl X * current_maxccls - current limit on the maximum number of unique ccl's X * cclmap - maps a ccl index to its set pointer X * ccllen - gives the length of a ccl X * cclng - true for a given ccl if the ccl is negated X * cclreuse - counts how many times a ccl is re-used X * current_max_ccl_tbl_size - current limit on number of characters needed X * to represent the unique ccl's X * ccltbl - holds the characters in each ccl - indexed by cclmap X */ X extern int lastccl, current_maxccls, *cclmap, *ccllen, *cclng, cclreuse; extern int current_max_ccl_tbl_size; extern char *ccltbl; X X X/* variables for miscellaneous information: X * starttime - real-time when we started X * endtime - real-time when we ended X * nmstr - last NAME scanned by the scanner X * sectnum - section number currently being parsed X * nummt - number of empty nxt/chk table entries X * hshcol - number of hash collisions detected by snstods X * dfaeql - number of times a newly created dfa was equal to an old one X * numeps - number of epsilon NFA states created X * eps2 - number of epsilon states which have 2 out-transitions X * num_reallocs - number of times it was necessary to realloc() a group X * of arrays X * tmpuses - number of DFA states that chain to templates X * totnst - total number of NFA states used to make DFA states X * peakpairs - peak number of transition pairs we had to store internally X * numuniq - number of unique transitions X * numdup - number of duplicate transitions X * hshsave - number of hash collisions saved by checking number of states X */ X extern char *starttime, *endtime, nmstr[MAXLINE]; extern int sectnum, nummt, hshcol, dfaeql, numeps, eps2, num_reallocs; extern int tmpuses, totnst, peakpairs, numuniq, numdup, hshsave; X char *allocate_array(), *reallocate_array(); X X#define allocate_integer_array(size) \ X (int *) allocate_array( size, sizeof( int ) ) X X#define reallocate_integer_array(array,size) \ X (int *) reallocate_array( (char *) array, size, sizeof( int ) ) X X#define allocate_integer_pointer_array(size) \ X (int **) allocate_array( size, sizeof( int * ) ) X X#define allocate_dfaacc_union(size) \ X (union dfaacc_union *) \ X allocate_array( size, sizeof( union dfaacc_union ) ) X X#define reallocate_integer_pointer_array(array,size) \ X (int **) reallocate_array( (char *) array, size, sizeof( int * ) ) X X#define reallocate_dfaacc_union(array, size) \ X (union dfaacc_union *) reallocate_array( (char *) array, size, sizeof( union dfaacc_union ) ) X X#define allocate_character_array(size) allocate_array( size, sizeof( char ) ) X X#define reallocate_character_array(array,size) \ X reallocate_array( array, size, sizeof( char ) ) X X X/* used to communicate between scanner and parser. The type should really X * be YYSTYPE, but we can't easily get our hands on it. X */ extern int yylval; END_OF_FILE if test 17327 -ne `wc -c <'flexdef.h'`; then echo shar: \"'flexdef.h'\" unpacked with wrong size! fi # end of 'flexdef.h' fi if test -f 'nfa.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'nfa.c'\" else echo shar: Extracting \"'nfa.c'\" $13633 characters$ sed "s/^X//" >'nfa.c' <<'END_OF_FILE' X/* nfa - NFA construction routines */ X X/* X * Copyright (c) 1987, the University of California X * X * The United States Government has rights in this work pursuant to X * contract no. DE-AC03-76SF00098 between the United States Department of X * Energy and the University of California. X * X * This program may be redistributed. Enhancements and derivative works X * may be created provided the new works, if made available to the general X * public, are made available for use by anyone. X */ X X#include "flexdef.h" X X/* add_accept - add an accepting state to a machine X * X * synopsis X * X * add_accept( mach, headcnt, trailcnt ); X * X * the global ACCNUM is incremented and the new value becomes mach's X * accepting number. if headcnt or trailcnt is non-zero then the machine X * recognizes a pattern with trailing context. headcnt is the number of X * characters in the matched part of the pattern, or zero if the matched X * part has variable length. trailcnt is the number of trailing context X * characters in the pattern, or zero if the trailing context has variable X * length. X */ X add_accept( mach, headcnt, trailcnt ) int mach, headcnt, trailcnt; X X { X int astate; X X fprintf( temp_action_file, "case %d:\n", ++accnum ); X X if ( headcnt > 0 || trailcnt > 0 ) X { /* do trailing context magic to not match the trailing characters */ X fprintf( temp_action_file, X "YY_DO_BEFORE_SCAN; /* undo effects of setting up yytext */\n" ); X X if ( headcnt > 0 ) X { X int head_offset = headcnt - 1; X X if ( fullspd || fulltbl ) X /* with the fast skeleton, yy_c_buf_p points to the *next* X * character to scan, rather than the one that was last X * scanned X */ X ++head_offset; X X if ( head_offset > 0 ) X fprintf( temp_action_file, "yy_c_buf_p = yy_b_buf_p + %d;\n", X head_offset ); X X else X fprintf( temp_action_file, "yy_c_buf_p = yy_b_buf_p;\n" ); X } X X else X fprintf( temp_action_file, "yy_c_buf_p -= %d;\n", trailcnt ); X X fprintf( temp_action_file, "YY_DO_BEFORE_ACTION; /* set up yytext again */\n" ); X } X X line_directive_out( temp_action_file ); X X /* hang the accepting number off an epsilon state. if it is associated X * with a state that has a non-epsilon out-transition, then the state X * will accept BEFORE it makes that transition, i.e., one character X * too soon X */ X X if ( transchar[finalst[mach]] == SYM_EPSILON ) X accptnum[finalst[mach]] = accnum; X X else X { X astate = mkstate( SYM_EPSILON ); X accptnum[astate] = accnum; X mach = link_machines( mach, astate ); X } X } X X X/* copysingl - make a given number of copies of a singleton machine X * X * synopsis X * X * newsng = copysingl( singl, num ); X * X * newsng - a new singleton composed of num copies of singl X * singl - a singleton machine X * num - the number of copies of singl to be present in newsng X */ X int copysingl( singl, num ) int singl, num; X X { X int copy, i; X X copy = mkstate( SYM_EPSILON ); X X for ( i = 1; i <= num; ++i ) X copy = link_machines( copy, dupmachine( singl ) ); X X return ( copy ); X } X X X/* dumpnfa - debugging routine to write out an nfa X * X * synopsis X * int state1; X * dumpnfa( state1 ); X */ X dumpnfa( state1 ) int state1; X X { X int sym, tsp1, tsp2, anum, ns; X X fprintf( stderr, "\n\n********** beginning dump of nfa with start state %d\n", X state1 ); X X /* we probably should loop starting at firstst[state1] and going to X * lastst[state1], but they're not maintained properly when we "or" X * all of the rules together. So we use our knowledge that the machine X * starts at state 1 and ends at lastnfa. X */ X X /* for ( ns = firstst[state1]; ns <= lastst[state1]; ++ns ) */ X for ( ns = 1; ns <= lastnfa; ++ns ) X { X fprintf( stderr, "state # %4d\t", ns ); X X sym = transchar[ns]; X tsp1 = trans1[ns]; X tsp2 = trans2[ns]; X anum = accptnum[ns]; X X fprintf( stderr, "%3d: %4d, %4d", sym, tsp1, tsp2 ); X X if ( anum != NIL ) X fprintf( stderr, " [%d]", anum ); X X fprintf( stderr, "\n" ); X } X X fprintf( stderr, "********** end of dump\n" ); X } X X X/* dupmachine - make a duplicate of a given machine X * X * synopsis X * X * copy = dupmachine( mach ); X * X * copy - holds duplicate of mach X * mach - machine to be duplicated X * X * note that the copy of mach is NOT an exact duplicate; rather, all the X * transition states values are adjusted so that the copy is self-contained, X * as the original should have been. X * X * also note that the original MUST be contiguous, with its low and high X * states accessible by the arrays firstst and lastst X */ X int dupmachine( mach ) int mach; X X { X int i, state, init, last = lastst[mach], state_offset; X X for ( i = firstst[mach]; i <= last; ++i ) X { X state = mkstate( transchar[i] ); X X if ( trans1[i] != NO_TRANSITION ) X { X mkxtion( finalst[state], trans1[i] + state - i ); X X if ( transchar[i] == SYM_EPSILON && trans2[i] != NO_TRANSITION ) X mkxtion( finalst[state], trans2[i] + state - i ); X } X X accptnum[state] = accptnum[i]; X } X X state_offset = state - i + 1; X X init = mach + state_offset; X firstst[init] = firstst[mach] + state_offset; X finalst[init] = finalst[mach] + state_offset; X lastst[init] = lastst[mach] + state_offset; X X return ( init ); X } X X X/* link_machines - connect two machines together X * X * synopsis X * X * new = link_machines( first, last ); X * X * new - a machine constructed by connecting first to last X * first - the machine whose successor is to be last X * last - the machine whose predecessor is to be first X * X * note: this routine concatenates the machine first with the machine X * last to produce a machine new which will pattern-match first first X * and then last, and will fail if either of the sub-patterns fails. X * FIRST is set to new by the operation. last is unmolested. X */ X int link_machines( first, last ) int first, last; X X { X if ( first == NIL ) X return ( last ); X X else if ( last == NIL ) X return ( first ); X X else X { X mkxtion( finalst[first], last ); X finalst[first] = finalst[last]; X lastst[first] = max( lastst[first], lastst[last] ); X firstst[first] = min( firstst[first], firstst[last] ); X X return ( first ); X } X } X X X/* mkbranch - make a machine that branches to two machines X * X * synopsis X * X * branch = mkbranch( first, second ); X * X * branch - a machine which matches either first's pattern or second's X * first, second - machines whose patterns are to be or'ed (the | operator) X * X * note that first and second are NEITHER destroyed by the operation. Also, X * the resulting machine CANNOT be used with any other "mk" operation except X * more mkbranch's. Compare with mkor() X */ X int mkbranch( first, second ) int first, second; X X { X int eps; X X if ( first == NO_TRANSITION ) X return ( second ); X X else if ( second == NO_TRANSITION ) X return ( first ); X X eps = mkstate( SYM_EPSILON ); X X mkxtion( eps, first ); X mkxtion( eps, second ); X X return ( eps ); X } X X X/* mkclos - convert a machine into a closure X * X * synopsis X * new = mkclos( state ); X * X * new - a new state which matches the closure of "state" X */ X int mkclos( state ) int state; X X { X return ( mkopt( mkposcl( state ) ) ); X } X X X/* mkopt - make a machine optional X * X * synopsis X * X * new = mkopt( mach ); X * X * new - a machine which optionally matches whatever mach matched X * mach - the machine to make optional X * X * notes: X * 1. mach must be the last machine created X * 2. mach is destroyed by the call X */ X int mkopt( mach ) int mach; X X { X int eps; X X if ( ! SUPER_FREE_EPSILON(finalst[mach]) ) X { X eps = mkstate( SYM_EPSILON ); X mach = link_machines( mach, eps ); X } X X /* can't skimp on the following if FREE_EPSILON(mach) is true because X * some state interior to "mach" might point back to the beginning X * for a closure X */ X eps = mkstate( SYM_EPSILON ); X mach = link_machines( eps, mach ); X X mkxtion( mach, finalst[mach] ); X X return ( mach ); X } X X X/* mkor - make a machine that matches either one of two machines X * X * synopsis X * X * new = mkor( first, second ); X * X * new - a machine which matches either first's pattern or second's X * first, second - machines whose patterns are to be or'ed (the | operator) X * X * note that first and second are both destroyed by the operation X * the code is rather convoluted because an attempt is made to minimize X * the number of epsilon states needed X */ X int mkor( first, second ) int first, second; X X { X int eps, orend; X X if ( first == NIL ) X return ( second ); X X else if ( second == NIL ) X return ( first ); X X else X { X /* see comment in mkopt() about why we can't use the first state X * of "first" or "second" if they satisfy "FREE_EPSILON" X */ X eps = mkstate( SYM_EPSILON ); X X first = link_machines( eps, first ); X X mkxtion( first, second ); X X if ( SUPER_FREE_EPSILON(finalst[first]) && X accptnum[finalst[first]] == NIL ) X { X orend = finalst[first]; X mkxtion( finalst[second], orend ); X } X X else if ( SUPER_FREE_EPSILON(finalst[second]) && X accptnum[finalst[second]] == NIL ) X { X orend = finalst[second]; X mkxtion( finalst[first], orend ); X } X X else X { X eps = mkstate( SYM_EPSILON ); X X first = link_machines( first, eps ); X orend = finalst[first]; X X mkxtion( finalst[second], orend ); X } X } X X finalst[first] = orend; X return ( first ); X } X X X/* mkposcl - convert a machine into a positive closure X * X * synopsis X * new = mkposcl( state ); X * X * new - a machine matching the positive closure of "state" X */ X int mkposcl( state ) int state; X X { X int eps; X X if ( SUPER_FREE_EPSILON(finalst[state]) ) X { X mkxtion( finalst[state], state ); X return ( state ); X } X X else X { X eps = mkstate( SYM_EPSILON ); X mkxtion( eps, state ); X return ( link_machines( state, eps ) ); X } X } X X X/* mkrep - make a replicated machine X * X * synopsis X * new = mkrep( mach, lb, ub ); X * X * new - a machine that matches whatever "mach" matched from "lb" X * number of times to "ub" number of times X * X * note X * if "ub" is INFINITY then "new" matches "lb" or more occurrences of "mach" X */ X int mkrep( mach, lb, ub ) int mach, lb, ub; X X { X int base, tail, copy, i; X X base = copysingl( mach, lb - 1 ); X X if ( ub == INFINITY ) X { X copy = dupmachine( mach ); X mach = link_machines( mach, link_machines( base, mkclos( copy ) ) ); X } X X else X { X tail = mkstate( SYM_EPSILON ); X X for ( i = lb; i < ub; ++i ) X { X copy = dupmachine( mach ); X tail = mkopt( link_machines( copy, tail ) ); X } X X mach = link_machines( mach, link_machines( base, tail ) ); X } X X return ( mach ); X } X X X/* mkstate - create a state with a transition on a given symbol X * X * synopsis X * X * state = mkstate( sym ); X * X * state - a new state matching sym X * sym - the symbol the new state is to have an out-transition on X * X * note that this routine makes new states in ascending order through the X * state array (and increments LASTNFA accordingly). The routine DUPMACHINE X * relies on machines being made in ascending order and that they are X * CONTIGUOUS. Change it and you will have to rewrite DUPMACHINE (kludge X * that it admittedly is) X */ X int mkstate( sym ) int sym; X X { X if ( ++lastnfa >= current_mns ) X { X if ( (current_mns += MNS_INCREMENT) >= MAXIMUM_MNS ) X lerrif( "input rules are too complicated (>= %d NFA states)", X current_mns ); X X ++num_reallocs; X X transchar = reallocate_integer_array( transchar, current_mns ); X trans1 = reallocate_integer_array( trans1, current_mns ); X trans2 = reallocate_integer_array( trans2, current_mns ); X accptnum = reallocate_integer_array( accptnum, current_mns ); X firstst = reallocate_integer_array( firstst, current_mns ); X finalst = reallocate_integer_array( finalst, current_mns ); X lastst = reallocate_integer_array( lastst, current_mns ); X } X X transchar[lastnfa] = sym; X trans1[lastnfa] = NO_TRANSITION; X trans2[lastnfa] = NO_TRANSITION; X accptnum[lastnfa] = NIL; X firstst[lastnfa] = lastnfa; X finalst[lastnfa] = lastnfa; X lastst[lastnfa] = lastnfa; X X /* fix up equivalence classes base on this transition. Note that any X * character which has its own transition gets its own equivalence class. X * Thus only characters which are only in character classes have a chance X * at being in the same equivalence class. E.g. "a|b" puts 'a' and 'b' X * into two different equivalence classes. "[ab]" puts them in the same X * equivalence class (barring other differences elsewhere in the input). X */ X X if ( sym < 0 ) X { X /* we don't have to update the equivalence classes since that was X * already done when the ccl was created for the first time X */ X } X X else if ( sym == SYM_EPSILON ) X ++numeps; X X else X { X if ( useecs ) X mkechar( sym, nextecm, ecgroup ); X } X X return ( lastnfa ); X } X X X/* mkxtion - make a transition from one state to another X * X * synopsis X * X * mkxtion( statefrom, stateto ); X * X * statefrom - the state from which the transition is to be made X * stateto - the state to which the transition is to be made X */ X mkxtion( statefrom, stateto ) int statefrom, stateto; X X { X if ( trans1[statefrom] == NO_TRANSITION ) X trans1[statefrom] = stateto; X X else if ( (transchar[statefrom] != SYM_EPSILON) || X (trans2[statefrom] != NO_TRANSITION) ) X flexfatal( "found too many transitions in mkxtion()" ); X X else X { /* second out-transition for an epsilon state */ X ++eps2; X trans2[statefrom] = stateto; X } X } END_OF_FILE if test 13633 -ne `wc -c <'nfa.c'`; then echo shar: \"'nfa.c'\" unpacked with wrong size! fi # end of 'nfa.c' fi echo shar: End of archive 3 $of 5$. cp /dev/null ark3isdone MISSING="" for I in 1 2 3 4 5 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have unpacked all 5 archives. rm -f ark[1-9]isdone else echo You still need to unpack the following archives: echo " " ${MISSING} fi ## End of shell archive. exit 0